home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Almathera Ten Pack 2: CDPD 1
/
Almathera Ten on Ten - Disc 2: CDPD 1.iso
/
pd
/
351-375
/
352
/
treewalk
/
timings
< prev
next >
Wrap
Text File
|
1995-03-14
|
8KB
|
199 lines
All timings run on an A1000 w/ 68010, 4Meg of fast ram & 1/2Meg of
chip ram, off of a Supra 60 drive. My standard work environment was in
place: interlaced morerow'ed WB screen and 50K stack, with the
following active processes:
Task Pri Address Command Directory
1 0 251948 jobs src:treewalk
2 0 250878 emacs src:treewalk
3 0 213e40 SupraMount
4 20 24fb38 bin:startprogs/machII RAM DISK:
6 0 2552f0 bin:startprogs/wicon RAM DISK:
In addition, snipit, ARexx 1.10, srt, FF, conman, installbeep the wb
and mymenu were in place.
The current directory was src:treewalk, the directory tree scanned was
rooted at src:tmp, consisting of about 20Meg of random stuff. It
wasn't changed throughout.
Find is a PA find, available on fish disk 197. Treewalk is the binary
included in this distribution. files is Lattice's files command,
version 1.01, from Lattice C 5.02.
Though files & treewalk are residentable, find is not. Therefore, all
three commands were run non-resident to even the field. The object was
to measure the algorithms used, not the implementation details.
First test: walking a large tree.
Timings labeled "no output" were made from Rexx, via a script that
ran each command 10 times, following each run by the time elapsed
during the run, measured in seconds. The output was run through "grep
-v src:" to throw away all output but timings and error messages.
Timings labeled "output" consisted of one run, with the output going
to standard out.
find src:tmp -print, output: 115.62, no output:
28.04 28.00 28.14 28.08 28.10 28.02 28.02 28.08 28.00 28.18
average: 28.066
treewalk dir src:tmp, output: 122.48, no output:
46.66 46.54 46.78 46.60 46.48 46.54 46.74 46.72 46.68 46.46
average: 46.620
files src:tmp, output: 240.70, no output
163.58 163.64 163.50 163.30 163.54 163.96 164.10 163.44 163.00 163.36
average: 163.542
note: files complained about multiple directories having to many files
or being empty. It doesn't state which.
Second test: listing files from a large file system that need to be
backed up.
This was lifted from my backup script, which normally process the
output. To make this more realistic, the output run through "grep -v
src:" which matches the actual use during backup (being run into
execio for processing by the Rexx backup script). Once again 10
iterations were run. Note: only two files actually met the entire
selection criteria, which isn't unusual.
find src:tmp -type f -newer src:last-backup ! -name *~ ! -name *.o -print
24.88 25.06 25.06 25.08 25.12 25.10 25.16 25.10 25.22 25.20
average: 25.098
treewalk dir src:tmp filter "file && src:last-backup.date < date
&& !(filename *= '*~' || filename *= '*.o')"
25.24 25.42 25.32 25.48 25.48 25.52 25.34 25.24 25.42 25.38
average: 25.384
Files is unable to perform this search, as it lacks the ability to
test for files not matching a name.
Third test: cleaning up a large working directory.
A copy of the tree was created, and the copy is deleted in two passes:
first, all files matching "*.o" were deleted, and then everything else
was deleted. The deletion utility is "rm", which is a version of
delete without the limit on the number of arguments. This allows
treewalk to not have to invoke the command multiple times. While this
may seem unfair to find, part of the purpose of creating treewalk was
to overcome this disability in find. To make the test realistic, rm
was resident for all runs of the test. Files has an option to cause
file deletion which was used so that files would run in reasonable
time. The sources to "rm" are available upon request.
To avoid having to copy the tree multiple times, this test was run
only one time for each command. Since the multiple run tests show
little variance, it isn't expected that these will show much variance
either.
time
find tmp:mg -name *.o -exec rm "{}" ";" 53.76
find tmp:mg -exec rm "{}" ";" 150.68
Note: Find doesn't support AmigaDOS wildcarding.
Note: Find failed to delete any directories during the second phase of
the trial, even though it deleted all regular files.
treewalk dir tmp:mg filter "filename#='#?.o'" rm 24.40
treewalk post dir tmp:mg rm 50.66
Note: to insure that directories are seen after files, treewalk needs
to be told to do a postorder traversal of the tree during the second
phase.
Note: treewalk did not delete the top-level directory, but this is to
be expected from it's documentation.
files -rerase -name #?.o tmp:mg 353.14
files -rerase tmp:mg 159.88
Note: files complains that it can't delete certain directories during
the first phase. This is odd and somewhat annoying.
As a couple of asides, I ran the filtered treewalk file removal,
forcing treewalk to run a single copy of rm for each file to delete
(the same behavior that find uses) to gain some measure of how
important the ability to stack file names on a command is. I then ran
the full delete using the standard AmigaDOS delete command, to see how
that compared with the other cases.
treewalk dir tmp:mg filter "filename#='#?.o'" single rm 43.38
delete tmp:mg all quiet 32.64
Final note: program sizes.
find 13044 ----rwed 19-Apr-89 02:29:43
treewalk 19904 --p-rwed Today 21:37:31
files 24096 --p-rwed 19-Apr-89 02:30:22
Some statistics:
Running time as a percentage of the slowest program. Times for
multiple run tests are the average.
program files find treewalk <aside>
test
1 output 100 48 51
1 no output 100 17 29
2 not possible 99 100
3 filtered 100 15 7 12
3 unfiltered 100 94 32 20
total 1 & 3 100 38 26
Conclusions:
It should be clear that files is the least worthwhile tool of the lot.
It's far slower than either of the other two, not as flexible, and
much larger. It's inability to distinguish between an empty directory
and to many files in a directory is a serious handicap for unattended
use on large devices. That source is available to the other two tools,
but not to files, doesn't help. Finally, it's insistence on blaming
Lattice for it's existence every time it starts just adds insult to
injury.
Find appears appears to do the actual directory scanning faster than
treewalk, but does most everything else slower. Possibly moving to a
newer compiler technology would change this lack of speed. However,
it's inability to execute a command with multiple file arguments seems
to be a major performance hit, and that appears to be inherent in it's
user interface, and not solvable without a major redesign (i.e. -
treewalk). It is less flexible than treewalk, not having the ability
to do things like select all files that were last modified on a
particular date. However, it is smaller, which could be a benefit in
disk-tight situations.
Treewalk does ok for speed, but not wonderfully. In particular, if
there is no filtering and the output is going to memory instead of the
console, it runs slightly faster than 1/2 the speed of find. This is
probably incurred by 1) not using the stack to store the visitation
history, so as to avoid not using a vital resource, and 2) using a
general treewalking routine instead of one that's inseparable from the
program. However, it's ability to select which files to process is
better than either alternative. In particular, rather than choosing a
small set of primitives about files and hardwiring them into the
program, it allows the user to access the data in the files
FileInfoBlock, and manipulate it via C-like expressions. The addition
of the ability to use ARexx macros as primitives is of unknown utility,
but does allow treewalk to mimic the multiple-exec and the '-ok'
features of find.
The bottom line is that there is no technical reason to use files.
Find may be preferable in some cases, but treewalk is probably to be
preferred in the general case.
Copyright 1989, Mike W. Meyer
These files may be used and redistributed under the terms
found in the file LICENSE.